26 research outputs found

    A Novel Focal Tversky loss function with improved Attention U-Net for lesion segmentation

    Full text link
    We propose a generalized focal loss function based on the Tversky index to address the issue of data imbalance in medical image segmentation. Compared to the commonly used Dice loss, our loss function achieves a better trade off between precision and recall when training on small structures such as lesions. To evaluate our loss function, we improve the attention U-Net model by incorporating an image pyramid to preserve contextual features. We experiment on the BUS 2017 dataset and ISIC 2018 dataset where lesions occupy 4.84% and 21.4% of the images area and improve segmentation accuracy when compared to the standard U-Net by 25.7% and 3.6%, respectively.Comment: submitted to 2019 IEEE International Symposium on Biomedical Imaging (ISBI

    A Novel Image-centric Approach Towards Direct Volume Rendering

    Full text link
    Transfer Function (TF) generation is a fundamental problem in Direct Volume Rendering (DVR). A TF maps voxels to color and opacity values to reveal inner structures. Existing TF tools are complex and unintuitive for the users who are more likely to be medical professionals than computer scientists. In this paper, we propose a novel image-centric method for TF generation where instead of complex tools, the user directly manipulates volume data to generate DVR. The user's work is further simplified by presenting only the most informative volume slices for selection. Based on the selected parts, the voxels are classified using our novel Sparse Nonparametric Support Vector Machine classifier, which combines both local and near-global distributional information of the training data. The voxel classes are mapped to aesthetically pleasing and distinguishable color and opacity values using harmonic colors. Experimental results on several benchmark datasets and a detailed user survey show the effectiveness of the proposed method.Comment: To appear in the ACM Transactions in Intelligent Systems and Technolog

    Multi-level Stress Assessment Using Multi-domain Fusion of ECG Signal

    Full text link
    Stress analysis and assessment of affective states of mind using ECG as a physiological signal is a burning research topic in biomedical signal processing. However, existing literature provides only binary assessment of stress, while multiple levels of assessment may be more beneficial for healthcare applications. Furthermore, in present research, ECG signal for stress analysis is examined independently in spatial domain or in transform domains but the advantage of fusing these domains has not been fully utilized. To get the maximum advantage of fusing diferent domains, we introduce a dataset with multiple stress levels and then classify these levels using a novel deep learning approach by converting ECG signal into signal images based on R-R peaks without any feature extraction. Moreover, We made signal images multimodal and multidomain by converting them into time-frequency and frequency domain using Gabor wavelet transform (GWT) and Discrete Fourier Transform (DFT) respectively. Convolutional Neural networks (CNNs) are used to extract features from different modalities and then decision level fusion is performed for improving the classification accuracy. The experimental results on an in-house dataset collected with 15 users show that with proposed fusion framework and using ECG signal to image conversion, we reach an average accuracy of 85.45%

    Multidomain Multimodal Fusion For Human Action Recognition Using Inertial Sensors

    Full text link
    One of the major reasons for misclassification of multiplex actions during action recognition is the unavailability of complementary features that provide the semantic information about the actions. In different domains these features are present with different scales and intensities. In existing literature, features are extracted independently in different domains, but the benefits from fusing these multidomain features are not realized. To address this challenge and to extract complete set of complementary information, in this paper, we propose a novel multidomain multimodal fusion framework that extracts complementary and distinct features from different domains of the input modality. We transform input inertial data into signal images, and then make the input modality multidomain and multimodal by transforming spatial domain information into frequency and time-spectrum domain using Discrete Fourier Transform (DFT) and Gabor wavelet transform (GWT) respectively. Features in different domains are extracted by Convolutional Neural networks (CNNs) and then fused by Canonical Correlation based Fusion (CCF) for improving the accuracy of human action recognition. Experimental results on three inertial datasets show the superiority of the proposed method when compared to the state-of-the-art

    Machine Learning on Biomedical Images: Interactive Learning, Transfer Learning, Class Imbalance, and Beyond

    Full text link
    In this paper, we highlight three issues that limit performance of machine learning on biomedical images, and tackle them through 3 case studies: 1) Interactive Machine Learning (IML): we show how IML can drastically improve exploration time and quality of direct volume rendering. 2) transfer learning: we show how transfer learning along with intelligent pre-processing can result in better Alzheimer's diagnosis using a much smaller training set 3) data imbalance: we show how our novel focal Tversky loss function can provide better segmentation results taking into account the imbalanced nature of segmentation datasets. The case studies are accompanied by in-depth analytical discussion of results with possible future directions.Comment: Accepted at IEEE MIPR 2019. arXiv admin note: text overlap with arXiv:1810.0784

    Motion Vector Extrapolation for Video Object Detection

    Full text link
    Despite the continued successes of computationally efficient deep neural network architectures for video object detection, performance continually arrives at the great trilemma of speed versus accuracy versus computational resources (pick two). Current attempts to exploit temporal information in video data to overcome this trilemma are bottlenecked by the state-of-the-art in object detection models. We present, a technique which performs video object detection through the use of off-the-shelf object detectors alongside existing optical flow based motion estimation techniques in parallel. Through a set of experiments on the benchmark MOT20 dataset, we demonstrate that our approach significantly reduces the baseline latency of any given object detector without sacrificing any accuracy. Further latency reduction, up to 25x lower than the original latency, can be achieved with minimal accuracy loss. MOVEX enables low latency video object detection on common CPU based systems, thus allowing for high performance video object detection beyond the domain of GPU computing. The code is available at https://github.com/juliantrue/movex

    Facial Expression Recognition Under Partial Occlusion from Virtual Reality Headsets based on Transfer Learning

    Full text link
    Facial expressions of emotion are a major channel in our daily communications, and it has been subject of intense research in recent years. To automatically infer facial expressions, convolutional neural network based approaches has become widely adopted due to their proven applicability to Facial Expression Recognition (FER) task.On the other hand Virtual Reality (VR) has gained popularity as an immersive multimedia platform, where FER can provide enriched media experiences. However, recognizing facial expression while wearing a head-mounted VR headset is a challenging task due to the upper half of the face being completely occluded. In this paper we attempt to overcome these issues and focus on facial expression recognition in presence of a severe occlusion where the user is wearing a head-mounted display in a VR setting. We propose a geometric model to simulate occlusion resulting from a Samsung Gear VR headset that can be applied to existing FER datasets. Then, we adopt a transfer learning approach, starting from two pretrained networks, namely VGG and ResNet. We further fine-tune the networks on FER+ and RAF-DB datasets. Experimental results show that our approach achieves comparable results to existing methods while training on three modified benchmark datasets that adhere to realistic occlusion resulting from wearing a commodity VR headset. Code for this paper is available at: https://github.com/bita-github/MRP-FERComment: To be presented at the IEEE BigMM 202

    Human Action Recognition Using Deep Multilevel Multimodal (M2) Fusion of Depth and Inertial Sensors

    Full text link
    Multimodal fusion frameworks for Human Action Recognition (HAR) using depth and inertial sensor data have been proposed over the years. In most of the existing works, fusion is performed at a single level (feature level or decision level), missing the opportunity to fuse rich mid-level features necessary for better classification. To address this shortcoming, in this paper, we propose three novel deep multilevel multimodal fusion frameworks to capitalize on different fusion strategies at various stages and to leverage the superiority of multilevel fusion. At input, we transform the depth data into depth images called sequential front view images (SFIs) and inertial sensor data into signal images. Each input modality, depth and inertial, is further made multimodal by taking convolution with the Prewitt filter. Creating "modality within modality" enables further complementary and discriminative feature extraction through Convolutional Neural Networks (CNNs). CNNs are trained on input images of each modality to learn low-level, high-level and complex features. Learned features are extracted and fused at different stages of the proposed frameworks to combine discriminative and complementary information. These highly informative features are served as input to a multi-class Support Vector Machine (SVM). We evaluate the proposed frameworks on three publicly available multimodal HAR datasets, namely, UTD Multimodal Human Action Dataset (MHAD), Berkeley MHAD, and UTD-MHAD Kinect V2. Experimental results show the supremacy of the proposed fusion frameworks over existing methods.Comment: 10 pages, 13 figure

    Towards Improved Human Action Recognition Using Convolutional Neural Networks and Multimodal Fusion of Depth and Inertial Sensor Data

    Full text link
    This paper attempts at improving the accuracy of Human Action Recognition (HAR) by fusion of depth and inertial sensor data. Firstly, we transform the depth data into Sequential Front view Images(SFI) and fine-tune the pre-trained AlexNet on these images. Then, inertial data is converted into Signal Images (SI) and another convolutional neural network (CNN) is trained on these images. Finally, learned features are extracted from both CNN, fused together to make a shared feature layer, and these features are fed to the classifier. We experiment with two classifiers, namely Support Vector Machines (SVM) and softmax classifier and compare their performances. The recognition accuracies of each modality, depth data alone and sensor data alone are also calculated and compared with fusion based accuracies to highlight the fact that fusion of modalities yields better results than individual modalities. Experimental results on UTD-MHAD and Kinect 2D datasets show that proposed method achieves state of the art results when compared to other recently proposed visual-inertial action recognition methods

    Deep Clustering with a Dynamic Autoencoder: From Reconstruction towards Centroids Construction

    Full text link
    In unsupervised learning, there is no apparent straightforward cost function that can capture the significant factors of variations and similarities. Since natural systems have smooth dynamics, an opportunity is lost if an unsupervised objective function remains static during the training process. The absence of concrete supervision suggests that smooth dynamics should be integrated. Compared to classical static cost functions, dynamic objective functions allow to better make use of the gradual and uncertain knowledge acquired through pseudo-supervision. In this paper, we propose Dynamic Autoencoder (DynAE), a novel model for deep clustering that overcomes a clustering-reconstruction trade-off, by gradually and smoothly eliminating the reconstruction objective function in favor of a construction one. Experimental evaluations on benchmark datasets show that our approach achieves state-of-the-art results compared to the most relevant deep clustering methods
    corecore